想像一個圖書館,書籍不是按到達日期排列,而是根據一個 通用金鑰。這正是從順序存取轉變為 關聯容器的典範轉變。與逐行掃描 vector 不同,我們改用 map 或 set 以達到對數時間的查找效率($O(\log n)$)
1. 關聯的本質
在一個 map中,我們儲存 鍵值對。鍵可作為索引,類型可以是字串、自訂物件,或任何支援 嚴格弱次序的資料類型。而 set則僅儲存唯一鍵,是進行成員測試或過濾的理想工具。
2. 有序與無序
標準容器(map, set)會維護鍵的排序。 C++11 標準 引入了無序版本(unordered_map),它使用 雜湊函數 來實現平均 $O(1)$ 的效能。使用這些高效率的桶時,請留意 C++11 標誌 標誌。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What happens when you use the subscript operator
m[k] on a map if the key k does not exist?It throws an out_of_range exception.
It returns a null pointer.
It adds a new element with key k and a value-initialized value.
The program fails to compile.
✅ Correct!
Subscripting a map behaves differently from a vector; it automatically inserts missing keys.❌ Incorrect
Unlike at(), the [] operator is an insertion-on-demand mechanism.QUESTION 2
Which of the following calls is ILLEGAL for a multiset
c and a vector v?copy(v.begin(), v.end(), inserter(c, c.end()));copy(v.begin(), v.end(), back_inserter(c));copy(c.begin(), c.end(), back_inserter(v));copy(c.begin(), c.end(), inserter(v, v.end()));✅ Correct!
Correct. back_inserter requires push_back, which associative containers like multiset do not support.❌ Incorrect
Associative containers lack push_back, so back_inserter cannot be used with them.QUESTION 3
Identify the correct way to define a variable to hold the result of calling
find on a map<string, vector> >.vector<int> res = m.find("key");map<string, vector>::iterator it = m.find("key"); >pair<string, vector> p = m.find("key"); >bool found = m.find("key");✅ Correct!
find returns an iterator to the element if found, or m.end() otherwise.❌ Incorrect
The find method always returns an iterator, not the value or a boolean.QUESTION 4
What is the primary advantage of using a
set over a vector for storing a list of excluded words?A set uses less memory than a vector.
A set allows for duplicate excluded words.
A set provides O(log N) lookup instead of O(N) linear search.
A set allows elements to be accessed by index.
✅ Correct!
Using exclude.find(word) on a set is significantly more efficient than std::find on a vector for large datasets.❌ Incorrect
While a vector is simpler, its linear search ($O(N)$) becomes a bottleneck compared to a set's $O(\log N)$.QUESTION 5
In the expression
++word_count.insert({word, 0}).first->second;, what does first refer to?The key of the inserted element.
The boolean indicating if insertion was successful.
An iterator to the element with the given key.
The first element in the entire map.
✅ Correct!
map::insert returns a pair where first is an iterator to the element.❌ Incorrect
The return of insert is a pair<iterator, bool>. first is the iterator.Module Assessment: High-Performance Text Processing
Implementing the Text-Query Architecture
You are tasked with building a search engine component (Exercise 12.28). The system must read a file and allow users to query words, returning the line numbers where those words appear. You must decide on the most efficient container strategy without using custom classes initially.
Q
1. Design the data structure to hold the word-to-line-number mapping. Which combination is most efficient?
Solution:
A
A
map<string, set> > is the ideal choice. The map allows $O(\log N)$ lookup for the word (the key), and the set stores line numbers (values) uniquely and in ascending order, facilitating clean result output.Q
2. Write a code snippet to implement the word counting logic that ignores case and punctuation (Required: 100 words).
Solution:
To implement case-insensitive word counting:
To implement case-insensitive word counting:
for (char &c : word) c = tolower(c);
word.erase(remove_if(word.begin(), word.end(), ispunct), word.end());
++word_count[word];
This logic iterates through each character of the string, converting it to lowercase using tolower. Then, it utilizes the remove_if algorithm with ispunct to shift all punctuation marks to the end of the string, followed by erase to truncate them. Finally, it uses the map's subscript operator to increment the count. This ensures that 'Example!', 'example.', and 'Example' are all normalized to 'example' and counted under a single unique key, maintaining data integrity.Q
3. How would you modify the system to safely share this data with a 'QueryResult' object without duplicating the entire map?
Solution:
Use
Use
shared_ptr. Specifically, the QueryResult should hold a shared_ptr<set> >. By using QueryResult(sought, loc->second, file) where the second argument is a shared pointer to the line number set, the data remains valid even if the original TextQuery object's scope ends, while avoiding expensive deep copies.